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RELIABILITY  OF  COMPUTER  SYSTEMS 
ODRA  1305  and  R-32 


As  a  result  of  the  rapid  growth  of  the  country’s  computer 
production,  ODRAs  and  RIADs  are  being  encountered  with  increasing 
frequency  in  the  most  diverse  computer  centers  —  among  others  in 
centers  which  had  previously  utilized  foreign  equipment.  Following 
their  installation,  the  Polish  computers  are  beginning  to  compare 
with  foreign  machines.  The  operational  and  technical  service 
workers  obtain  relatively  soon  a  picture  of  the  reliability  of 
the  new  computer,  whereas  the  management  of  the  computing  center 
does  not  take  cognizance  for  quite  some  time  of  the  existence 
of  different  categories  of  computer  reliability. 


Since  completing  his  studies  at 
the  Electronics  Department  of  the 
Polytechnic  School  of  Warsaw  (1961)/ 
Master  Engineer  Wit  Drewniak  has  been 
working  in  the  Information  section 
of  the  Main  Statistical  Office, 
where  he  is  now  serving  as  principal 
technical  service  specialist  in 
the  Mechanization  and  Automation 
Division  of  the  Statistical  Works. 
During  the  years  1962-1977,  he  was 
a  lecturer  at  information-oriented 
technical  high  schools  in  Warsaw. 
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An  article  by  B.  Gliksman  entitled  '’Results  Obtained  During 
Utilization  of  the  JS  EMC  computers  installed  in  ZETO,  Katowice”, 
Informatyka,  No.  7-8/78,  deals  with  various  reliability  classes 
within  the  family  of  the  machines  of  the  Monolithic  System. 

From  the  tables  contained  in  that • article,  it  is  possible  to  compute 
the  values  of  time  T  between  failures  in  relation  to  utilization 
periods  extending  over  many  months.  These  values  had  the  following 
pattern  for  the  particular  computers  used  at  ZETO  Katowice: 

R-20  (first  year  of  use)  ~  16.1  hours 
R-20  (second  year  of  use)  —  52.8  hours 
R-22  (first  year  of  use)  —  42.8  hours 
R-32  (11  months  of  use)  —  11.1  hours 
R-50  (9  months  of  use)  —  10.5  hours 

Thus,  in  the  first  year  of  use  at  ZETO,  Katowice,  the  average 
monthly  times  between  failures  amounted  to  only  between  ten  and 
twenty  hours,  whereas  they  lasted  several  tens  of  hours  in  the 
second  year. 

It  should  be  emphasized  that  the  cited  article  has  served 
the  technical  service  staff  as  a  weighty  argument  during  discussions 
with  the  users,  who  would  like  to  attain  times  between  failures 
of  the  order  of  several  tens  of  hours  already  in  the  first  year  of 
use,  regardless  of  the  computers'  relability  class. 


RELIABILITY  INDICES  OF  COMPUTER  UTILIZATION  AT  OE  GUS ,  WARSAW 


We  know  from  utilization  experience  that  absolute  perfection 
of  technical  objects  (installations)  is  unattainable!  hence  these 


The  times  TV  given  below  for  the  ZETO,  Katowice,  machines  and 
the  times  Tv  for  the  machines  of  the  Electronics  Center  GUS  in 
‘Warsaw  wereAcomputed  in  accordance  with  Formula  (1). 


objects  (especially  computer  installations  in  view  of  their  exceptional 
structural  make-up)  are  subject  to  malfunctions. 

A  certain  measure  of  the  reliability  of  an  object  (installation,  , 
system)  is  the  average  time  between  two  successive  malfunctions  of 
this  object  in  a  defined  period  of  use.  This  time  —  TX2^  —  commonly 
called  the  reliability  index,  is  mathematically  defined  by  the  average 
value  of  the  probability  function  that  denotes  the  time  of  correct 
operation  between  the  object’s  two  successive  failures.  This 
assumes  the  form: 


where:  *  the  number  of  observed  failures  of  the  i-th  object 

in  a  given  period  of  utilization 

fij  *  °Peration  time  of  the  1-th  object  from  the  end  of 
repair  of  the  object  after  the  preceding  (j-l)-th 
failure  until  the  object’s  next  (j-th)  failure. 

Since  the  year  1978,  investigations  were  conducted  at  the 
Electronics  Center  GUS  in  Warsaw  of  the  operational  reliability 
of  the  computer  systems  used  there.  The  results  of  these  investiga¬ 
tions  are  presented  in  the  Tables  that  are  given  below. 

Prom  the  data  below  it  follows  that  the  characteristic  reliability 
index  T^j  pertaining  to  a  one-year  period  of  use  is  in  the  range  of 
20-30  hours  for  the  ODRA  1305  computers,  7  to  9  hours  for  the  R-32 
machine,  about  50  hours  (with  a  decreasing  trend  in  the  12th  year 
of  use)  for  the  ICL  1905  machines,  and  about  100  hours  for  the 
ICL  1903A  machine. 


1 - 

In  English  literature,  the  time  T.  is  defined  by  the  abbreviation 
MTBF  (Mean  Time  Between  Failures). 
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Key:  1)  month/year;  2)  total  operating  time  (hours);  3)  total 

breakdown  time  (hours);  4)  number  of  breakdown  interruptions; 
5)  average  time  between  failures  (hours). 

The  value  T  .  fora  yearly  period  of  use  amounted  to  14.8  hours  in 


1978  (and 


15.1  hours  in  the  first  year  of  use). 


I CL  1903A,  Serial  No.  431,  Ninth  Year  of  Use 
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Key:  1)  month/year ;  2)  total  operating  time  (hours);  3)  total 

breakdown  time  (hours);  4)  number  of  breakdown  interruptions; 
5)  average  time  between  failures  (hours). 

The  value  T..  for  a  yearly  period  of  use  amounted  to  160.2  hours  in 


1978  (and 


55.9  hours  in  the  preceding  year). 


ICL  1905,  Serial  No.  201,  Twelfth  Year  of  Use _ _ 
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Key:  1)  month/year;  2)  total  operating  time  (hours);  3)  total  breakdown 

time  (hours);  4)  number  of  breakdown  interruptions;  5)  average  tim< 
between  failures  (hours). 

The  value  T..  for  a  yearly  period  of  use  amounted  to  40.0  hours  in  1978 
(and  72.5  ^  hours  in  the  preceding  year). 
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ODRA  1305 >  Serial  No.  197,  Second  Year  of  Use 
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Key:  1)  month/year;  2)  total  operating  time  (hours);  3)  total  break¬ 
down  time  (hours);  4)  number  of  breakdown  interruptions; 

5)  average  time  between  failures  (hours). 

The  value  for  a  yearly  period  of  use  amounted  to  20.7  hours  in 
1978  (and  23.3  hours  in  the  first  year  of  use). 


ODRA  1305,  Serial  No.  233,  Second  Year  of  Use  (only  4  months  of 
operation  in  the  preceding  year) 
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Key:  1)  month/year;  2)  total  operating  time  (hours);  3)  total 

breakdown  time  (hours);  4)  number  of  breakdown  interruptions; 

5)  average  time  between  failures  (hours). 

The  value  Tx^  for  a  yearly  period  of  use  amounted  to  29.4  hours  in  1978 

R-32,  Serial  No.  021,  Second  Year  of  Use  (8  months  of  operation 
during  the  first  year  of  use) 
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Key:  1)  month/year;  2)  total  operating  time  (hours);  2)  total  break¬ 
down  time  (hours);  3)  number  of  breakdown  interruptions; 

5)  average  time  between  failures  (hours). 

The  value  T.  .  for  a  yearly  period  of  use  amounted  to  7.1  hours  in 
1978  (and  Jto  8.8  hours  in  the  previous  year). 


Notice  that  for  the  same  type  of  machine,  namely  R-32,  the  /7 

results  obtained  by  ZETO,  Katowice,  and  by  OE  GUS,  Warsaw,  converge. 

It  could  not  be  otherwise,  as  reliability  is  undoubtedly  an  intrinsic 
property  of  an  object,  and  not  a  feature  of  the  quality  of  its 
technical  maintenance.  The  technical  service  may  indeed  affect 
the  operation  of  computer  system  for  better  or  for  worse,  but  it 
can  not  alter  the  intrinsic  properties  of  this  system. 

Let  us  examine  now  how  the  obtained  results  pertaining  T..  ,  for 

A  J 

the  machines  ODRA  1305  and  R-32$  and  for  foreign  machines,  relate 
to  the  standards  that  are  obligatory  in  our  country. 


NORMALIZING  ^REQUIREMENTS  AND  DIAGRAMS  OP  THE  RELIABILITY  STRUCTURE 
OF  A  COMPUTER  SYSTEM 

According  to  the  professional  standard  BN-7 8/3108-03 ,  the  minimum 
value  of  the  time  Tx^  for  computer  equipment  amounts  to  100  hours. 

Let  us  accept  that  a  typical  computer  system  is  composed  of 
the  following  equipment:  1  processor,  3  blocks  of  operating  memory 
(each  of  32K  words),  1  operator’s  console  (monitor,  1  ("double" 
duty)  disc  memories  steering  unit,  6  disc  memories,  2  tape  memories 
steering  units,  8  tape  memories,  2  line  printers,  2  card  readers, 

1  paper  tape  reader,  and  1  paper  tape  punch. 

In  the  utilization  practice,  reserve  equipment  is  most  often 
functioning,  I.e.,  is  not  disconnected  and  is  ready  for  a  failure 
of  the  basic  equipment.  The  repair  of  any  malfunctioning  equipment 
occurs  only  after  such  equipment  Is  disconnected  from  the  operation. 
Such  a  system  corresponds  to  a  parallel  reliability  structure  of 
a  computer  system,  with  the  so-called  reserve  being  loaded  without 
repair. 


If,  with  the  acceptance  of  such  a  reliability  structure  of  a 
computer  system,  we  assume  that  the  typical  computer  system,  as 
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given  above,  comprises  as  reserve  equipment  tone  memory  block  of  32K 
words,  one  disc  memories  steering  unit,  one  disc  memory,  one  tape 
memory  steering  unit,  two  tape  memories,  a  printer,  and  one  card 
reader;  and  that,  at  a  given  moment,  only  one  piece  of  equipment  of 
a  given  type  undergoes  failure,  then  the  diagram  of  the  reliability 
structure  of  the  computer  system  under  discussion  may  be  represented 
as  follows: 

jacodBao^, 

Figure  1.  Diagram  of  the  reliability  structure  of  a  computer 
system. 

The  symbols  X  characterize  the  intensity  of  the  malfunctions  of 

-  the  processor  (x1) 

-  the  active  memory  block  of  32K  words  (^2/i^ 

-  the  redundant  memory  block  of  32K  words  (X2/2) 

-  the  operator's  console  (x^) 

-  the  disc  memories  steering  unit  (x^^) 

-  the  reserve  disc  memories  steering  unit  ( ^ 4/2  ^ 

-  the  disc  memory  (*5/^) 

-  the  disc  memory  reserve  unit  (*^/2) 

-  the  tape  memories  steering  unit  (Xg^) 

-  the  tape  memories  reserve  steering  unit  (Xg^2) 

-  the  tape  memory  (Xy^) 

-  the  two  tape  memory  reserve  units  (Xy/2  x7/3^ 

-  the  line  printer  (Xg^) 

-  the  reserve  line  printer  (Xg^2) 

-  the  paper  tape  reader  (x^^) 

-  the  reserve  card  reader  (Xg/2) 

-  the  paper  tape  reader  ( X1Q) 

-  the  paper  tape  punch  (X^). 
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Finally,  every  diagram  of  the  reliability  structure  can  be 
put  in  the  form  of  a  series.  In  our  case,  the  serial  diagram  of  the 
reliability  of  the  computer  system  is  represented  as  follows: 

— Q3HSIH2il-^5J-CSIHSHSHSKS5H5}-GS}~ 

Figure  2.  Serial  diagram  of  the  reliability  ’ucture  of 
a  computer  system. 


The  letter  z  next  to  the  symbols  denotes  the  failure  replacement 

intensity  for  a  given  set  of  equipment.  The  replacement  values  /8 

\  are  computed  from  the  formula 
zn 


(2) 


following  prior  computation  of  TXzn  from  the  formula 

lhJ  (3) 

*  <— i 

corresponding  to  a  parallel  reliability  structure  with  the  reserve 
loaded  without  repair,  where: 


X  =  intensity  of  equipment  malfunctions  (identical  for  all 
pieces  of  equipment  in  a  given  set) 


k  *  number  of  reserve  equipment  units 


n  ■  identifier  of  a  set  of  equipment. 


The  resulting  failure  intensity  of  the  computer  system  x 

W5 

will  be  the  sum  of  the  intensities  of  the  individual  members  of 
the  serial  diagram  of  the  reliability  structure  of  the  mentioned 
system,  namely: 
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(4) 


! 

™  4"  4"  4- 4*  J*«  4*  An  -}■  An  4*  A«»  4"  ^ii  (4) 

It-'} 

^  —On  the  other  hand,  the  average  value  of  the  time  between  two 
successive  failures  of  the  computer  system  T^wg  is  computed  from 
the  relation 

Tx„  =  j-  0»]  (5) 

Am 


Computations  of  T^ws 

Let  us  examine  three  different  cases,  namely: 

1)  establishment  of  minimal  for  the  individual  computer 
components; 

2)  acceptance  of  the  values  stated  by  national  equipment 
manufacturers; 

3)  acceptance  of  the  values  stated  by  foreign  manufacturers 
of  computer  equipment  (according  to  DATAMATION,  No.  9/78). 


Case  1 

As  already  mentioned  above,  the  professional  standard  BN-78/3208-03 
defines  T^min  =  100  hours  for  any  arbitrary  computer  component.  On  the 
basis  of  formulas  (2),  (3),  (4),  and  (5),  we  can  compute  the  minimum 
mean  time  T^wg  min  [hours]  between  two  successive  failures  of  a  com¬ 
puter  system. 

Case  2 

The  suppliers  or  producers  of  national  computer  equipment 
give  the  following  values  of  the  times  T^  for  the  following  equipment: 


Central  unit  ODRA  1305  with  an  active  memory  of  32K  words 

- 

120 

hours 

Operator's  console  (Facit  or  DZM  180/05) 

- 

1500 

hours 

Disc  memories  steering  unit  (pds  325) 

- 

1000 

hours 

Disc  memory  (of  Bulgarian  manufacture) 

- 

1000 

hours 

Tape  memories  steering  unit  (MTS  25-02) 

- 

450 

hours 

Tape  memory  (PT-3) 

- 

500 

hours 

Line  printer  (DW  325) 

- 

1000 

hours 

Card  reader  (CK  325) 

- 

450 

hours 

Paper  tape  reader  (CDT  325  -  part  of  the  reader) 

- 

500 

hours 

Paper  tape  punch  (CDT  325  -  part  of  the  punch) 

- 

200 

hours 

For  the  above  data, we  obtain  a  value  of  the  average  time 
between  two  successive  failures  of  the  national  computer  system 

Txws“  37,0  hours- 


Case  3 

According  to  the  data  contained  in  DATAMATION,  No.  9/78, 
the  typical  values  of  the  times  T^  for  particular  computer  components 
In  western  countries  are  at  present  as  follows: 

Processor 

Internal  memory  blocks  1MB 
Disc  or  tape  memories  steering  unit 
Disc  memory 
Tape  memory 
Line  printer 

Without  committing  a  major  error,  we  can  assume  that  T^ 

amounts  to  500  hours  for  the  paper  tape  reader,  450  hours  for  the 
card  reader,  200  hours  for  the  paper  tape  punch,  and  1500  hours  for 
the  operator's  console. 


-  1000  hours 

-  4000  hours 

-  3000  hours 

-  2500  hours 

-  1000  hours 

300  hours 


For  the  above  dat  we  obtain  a  value  of  the  average  time 
between  two  successive  failures  of  a  western  computer  system 

Taws“  72,3  hours* 

PRELIMINARY  ANALYSIS  OF  THE  RESULTS  OF  THE  COMPUTATIONS 

If  we  compare  the  real  Tx^2  (see  the  Tables)  for  the  ODRA  1305 
(20- 30  hours )  and  R-32(7- 11  hours)  computers  with  the  obtained 
time  TXws  min,  then  we  note  again  an  agreement  between  the 
theoretical  calculations  and  the  utilization  practice. 

The  ODRA  1305  computers  are  substantially  above  the  minimum 
limit  of  the  time  Tx  (11.7  hours)  calculated  for  a  typical  computer 
system  based  on  the  professional  standard ;  whereas  the  R-32  computers 
of  ZETO,  Katowice,  and  OE  GUS,  Warsaw,  are  below  this  minimum  limit. 

A  comparison  of  the  value  TXws  (37.0  hours),  calculated  on  the 
basis  of  the  times  Tx  stated  by  the  national  computer  equipment 
manufacturers,  with  the  real  value  Txl2  (20-30  hours)  for  the 
ODRA  1305  machines  shows  up  at  a  disadvantage.  This  testifies  to 
either  an  erroneous  calculation  of  the  theoretical  parameters  of 
computer  equipment  or  to  an  underestimation  of  the  problem  in  the 
manufacturing  process.  To  those  who  seek  to  lay  all  the  blame 
on  the  user  of  the  system,  and  hence  on  a  bad  performance  of  the 
technical  service  personnel,  it  may  be  answered  that  foreign 
machines  serviced  by  the  same  technical  cadre  stand  up  to  all 
comparisons  with  theoretical  calculations,  and  even  surpass  in 
practice  the  theoretical  results,  in  a  positive  sense. 

It  is  worth  adding,  at  this  point,  that  the  setting  of  Txmin=  100 
hours  in  the  professional  standard  for  computer  components  is  not 
sufficiently  challenging  to  the  manufacturers  of  this  equipment; 
and  therefore,  in  the  stage  of  creation  of  a  Polish  standard, 
the  level  of  the  requirements  should  be  appropriately  raised. 


In  conclusion,  it  is  also  worth  drawing  attention  to  the  matter 
of  programming  computer  systems  and  equipping  them  with  interchangeable 
parts.  Programming  (systemic,  utilizational,  diagnostic)  and  inter¬ 
changeable  parts  form  undoubtedly  an  integral  part  of  a  computer 
system,  and  must  therefore  be  treated  on  an  equal  basis  with  the 
equipment.  Regrettably,  this  is  not  confirmed  in  our  utilizational 
practice,  as  interchangeable  parts  are  chronically  lacking,  and  the 
programming  is  only  partly  and  very  belatedly  modernized. 


